Earth observation, aiming at monitoring the state of planet Earth using remote sensing data, is critical for improving our daily lives and living environment. With a growing number of satellites in orbit, an increasing number of datasets with diverse sensors and research domains are being published to facilitate the research of the remote sensing community. In this paper, we present a comprehensive review of more than 400 publicly published datasets, including applications like land use/cover, change/disaster monitoring, scene understanding, agriculture, climate change, and weather forecasting. We systematically analyze these Earth observation datasets with respect to five aspects volume, bibliometric analysis, resolution distributions, research domains, and the correlation between datasets. Based on the dataset attributes, we propose to measure, rank, and select datasets to build a new benchmark for model evaluation. Furthermore, a new platform for Earth observation, termed EarthNets, is released as a means of achieving a fair and consistent evaluation of deep learning methods on remote sensing data. EarthNets supports standard dataset libraries and cutting-edge deep learning models to bridge the gap between the remote sensing and machine learning communities. Based on this platform, extensive deep learning methods are evaluated on the new benchmark. The insightful results are beneficial to future research. The platform and dataset collections are publicly available at https://earthnets.github.io/.
translated by 谷歌翻译
具有很少带注释的样本的训练语义分割模型在各种现实世界中具有巨大的潜力。对于少数拍摄的分段任务,主要的挑战是如何准确地测量使用有限的培训数据之间的支持样本和查询样品之间的语义对应关系。为了解决这个问题,我们建议用可变形的4D变压器汇总可学习的协方差矩阵,以有效预测分割图。具体而言,在这项工作中,我们首先设计了一种新颖的艰难示例挖掘机制,以学习高斯过程的协方差内核。在对应测量中,学到的协方差内核函数比现有基于余弦相似性的方法具有很大的优势。基于学到的协方差内核,设计有效的双重变形4D变压器模块旨在适应骨料特征相似性图中的分割结果。通过组合这两种设计,提出的方法不仅可以在公共基准测试上设置新的最新性能,而且比现有方法更快地收敛。三个公共数据集的实验证明了我们方法的有效性。
translated by 谷歌翻译
视觉变压器(VIT)最近在一系列计算机视觉任务中占据了主导地位,但训练数据效率低下,局部语义表示能力较低,而没有适当的电感偏差。卷积神经网络(CNNS)固有地捕获了区域感知语义,激发了研究人员将CNN引入VIT的架构中,以为VIT提供理想的诱导偏见。但是,嵌入在VIT中的微型CNN实现的位置是否足够好?在本文中,我们通过深入探讨混合CNNS/VIT的宏观结构如何增强层次VIT的性能。特别是,我们研究了令牌嵌入层,别名卷积嵌入(CE)的作用,并系统地揭示了CE如何在VIT中注入理想的感应偏置。此外,我们将最佳CE配置应用于最近发布的4个最先进的Vits,从而有效地增强了相应的性能。最后,释放了一个有效的混合CNN/VIT家族,称为CETNET,可以用作通用的视觉骨架。具体而言,CETNET在Imagenet-1K上获得了84.9%的TOP-1准确性(从头开始训练),可可基准上的48.6%的盒子地图和ADE20K上的51.6%MIOU,从而显着提高了相应的最新态度的性能。艺术基线。
translated by 谷歌翻译
生成三维城市模型迅速对许多应用是至关重要的。单眼高度估计是最有效的,及时的方式来获得大型几何信息之一。但是,现有的工作主要集中在训练和测试模型中使用的数据集不偏不倚,不与现实世界的应用以及对齐。因此,我们提出了一个新的基准数据集,研究高度估计模型的可转移性在跨数据集的设置。为此,我们首先设计和构建跨数据集上的高度估计任务迁移学习了大规模的基准数据集。这个基准测试数据集包括一个新提出的大规模合成数据集,新集真实世界的数据集,并从不同的城市四个现有的数据集。接下来,两个新的实验方案,零次和几个次跨数据集传输,设计。对于一些次跨数据集的转移,我们增强了基于窗口的变压器与拟议规模变形卷积模块来处理严重的尺度变化问题。为了改善零射门跨数据集设置深模型的普遍性,基于最大规范化变压器网被设计成分离从绝对高度的相对高度的地图。实验结果表明在传统的和跨数据集传送设置两者所提出的方法的有效性。该数据集和代码是公开的,在https://thebenchmarkh.github.io/。
translated by 谷歌翻译
地球表面不断变化,识别变化在城市规划和可持续发展中发挥着重要作用。虽然多年来已经成功开发了变化检测技术,但这些技术仍然仅限于相关领域的专家和促进者。为了为每个用户提供灵活的进入更改信息并帮助他们更好地了解陆地覆盖的变化,我们介绍了一种新的任务:在多时间空中图像上更改基于检测的视觉问题应答(CDVQA)。特别地,可以查询多时间图像以根据两个输入图像之间的内容改变获得基于高电平的改变的信息。我们首先使用自动问题答案生成方法构建CDVQA数据集,包括多时间图像问题答案三联网。然后,在这项工作中设计了一个基线CDVQA框架,它包含四个部分:多时间特征编码,多时间融合,多模态融合和答案预测。此外,我们还将更改增强模块引入多时间特征编码,旨在结合更多的变更相关信息。最后,研究了CDVQA任务的性能研究不同骨干和多时间融合策略的影响。实验结果为开发更好的CDVQA模型提供了有用的见解,这对未来对此任务的研究很重要。我们将通过公开提供我们的数据集和代码。
translated by 谷歌翻译
最近快速的任意形状的文本检测已成为一个有吸引力的研究主题。但是,大多数现有方法都是非实时的,这可能在智能系统中缺少。尽管提出了一些实时文本方法,但检测精度远远落后于非实时方法。为了同时提高检测精度和速度,我们提出了一种新颖的快速准确的文本检测框架,即CM-NET,基于新的文本表示方法和多透视特征(MPF)模块构造。前者可以以高效且坚固的方式通过同心掩模(cm)拟合任意形状的文本轮廓。后者鼓励网络从多个角度来了解更多厘米相关的鉴别特征,并没有提供额外的计算成本。受益于CM和MPF的优点,所提出的CM-Net只需要预测一个CM的文本实例来重建文本轮廓,并与先前的作品相比,在检测精度和速度之间实现最佳平衡。此外,为了确保有效地学习多视角特征,提出了多因素约束损耗。广泛的实验证明了所提出的CM是有效且稳健的拟合任意形状的文本实例,并且还验证了MPF的有效性和对鉴别文本特征识别的影响损失。此外,实验结果表明,所提出的CM-Net优于现有的现有最先进的(SOTA)实时文本检测方法,其均以MSRA-TD500,CTW1500,总文和ICDAR2015的检测速度和准确性。数据集。
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
Time-series anomaly detection is an important task and has been widely applied in the industry. Since manual data annotation is expensive and inefficient, most applications adopt unsupervised anomaly detection methods, but the results are usually sub-optimal and unsatisfactory to end customers. Weak supervision is a promising paradigm for obtaining considerable labels in a low-cost way, which enables the customers to label data by writing heuristic rules rather than annotating each instance individually. However, in the time-series domain, it is hard for people to write reasonable labeling functions as the time-series data is numerically continuous and difficult to be understood. In this paper, we propose a Label-Efficient Interactive Time-Series Anomaly Detection (LEIAD) system, which enables a user to improve the results of unsupervised anomaly detection by performing only a small amount of interactions with the system. To achieve this goal, the system integrates weak supervision and active learning collaboratively while generating labeling functions automatically using only a few labeled data. All of these techniques are complementary and can promote each other in a reinforced manner. We conduct experiments on three time-series anomaly detection datasets, demonstrating that the proposed system is superior to existing solutions in both weak supervision and active learning areas. Also, the system has been tested in a real scenario in industry to show its practicality.
translated by 谷歌翻译
This paper investigates the use of artificial neural networks (ANNs) to solve differential equations (DEs) and the construction of the loss function which meets both differential equation and its initial/boundary condition of a certain DE. In section 2, the loss function is generalized to $n^\text{th}$ order ordinary differential equation(ODE). Other methods of construction are examined in Section 3 and applied to three different models to assess their effectiveness.
translated by 谷歌翻译